[BugFix] Fix routed_scaling_factor_learnable not taking effect in cutlass backend apply_tp#7903
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务暂无失败,仍有 1 个任务运行中、0 个等待中;请等待 required 任务全部完成后再判断是否可合入。当前 3 个失败任务均为 Optional,不阻塞合并,仅供参考。
2 任务状态汇总日志列说明:失败任务直接使用 2.1 Required任务 : 9/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)无 required 失败任务;本轮未触发深度失败分析。当前 required 主测试任务仍在运行中,请等待下一轮状态更新。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7903 +/- ##
==========================================
Coverage ? 63.78%
==========================================
Files ? 467
Lines ? 64959
Branches ? 9961
==========================================
Hits ? 41437
Misses ? 20715
Partials ? 2807
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-25 15:49:13
📋 Review 摘要
PR 概述:修复 MoE Learnable Score(routed_scaling_factor_learnable)在 cutlass backend 的 apply_tp 路径中未生效的 Bug。
变更范围:fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py
影响面 Tag:[BugFix] [OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fused_moe_cutlass_backend.py |
MoE 多后端同步:其他 backend 是否存在相同 bug |
📝 PR 规范检查
PR 标题缺少官方 Tag,描述各 section 内容均为空或占位符,需补全。
标题建议(可直接复制):
[BugFix] Fix routed_scaling_factor_learnable not taking effect in cutlass backend apply_tp
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
在 `fused_moe_cutlass_backend.py` 的 `apply_tp` 非 EP 路径中,`routed_scaling_factor_learnable` 的 per-expert scale 逻辑位于 `get_moe_scores` 调用之后,但 `moe_expert_dispatch` 内部会重新计算 `topk_weights` 和 `topk_idx`,导致该 learnable scale 被覆盖而实际未生效。
## Modifications
- `fused_moe_cutlass_backend.py`:将 `routed_scaling_factor_learnable` 分支从 `get_moe_scores` 调用之后,移动到 `moe_expert_dispatch` 调用之后执行,确保 per-expert scale 应用在最终 `topk_weights` 上;同时将 `get_moe_scores` 的 `topk_weights`/`topk_idx` 返回值改为 `_`/`__` 以避免与 `moe_expert_dispatch` 返回的同名变量冲突。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
修复思路正确:moe_expert_dispatch 内部会重新生成 topk_weights/topk_idx,原先在 get_moe_scores 之后应用 learnable scale 会被后续覆盖,移至 moe_expert_dispatch 之后才能保证生效。建议作者检查其他 MoE backend(deepgemm、triton 等)是否存在同样问题,酌情同步修复。
| if layer.routed_scaling_factor_learnable: | ||
| safe_topk_indices = paddle.clip(topk_idx, min=0) | ||
| gathered_scales = F.embedding(safe_topk_indices, layer.per_expert_scale.unsqueeze(1)).squeeze(-1) | ||
| topk_weights = topk_weights * gathered_scales |
There was a problem hiding this comment.
🟡 建议 MoE 多后端同步检查
当前修复只涉及 fused_moe_cutlass_backend.py,但 FastDeploy 存在多个 MoE backend 实现(fused_moe_deepgemm_backend.py、fused_moe_triton_backend.py、fused_moe_marlin_backend.py 等)。若这些 backend 中也存在类似的 routed_scaling_factor_learnable 逻辑,且同样放在 get_moe_scores 之后(而非 moe_expert_dispatch 之后),则可能存在相同 bug,建议排查并同步修复。
Motivation
在
fused_moe_cutlass_backend.py的apply_tp非 EP 路径中,routed_scaling_factor_learnable的 per-expert scale 逻辑位于get_moe_scores调用之后,但moe_expert_dispatch内部会重新计算topk_weights和topk_idx,导致该 learnable scale 被覆盖而实际未生效。Modifications
fused_moe_cutlass_backend.py:将routed_scaling_factor_learnable分支从get_moe_scores调用之后,移动到moe_expert_dispatch调用之后执行,确保 per-expert scale 应用在最终topk_weights上;同时将get_moe_scores的topk_weights/topk_idx返回值改为_/__以避免与moe_expert_dispatch返回的同名变量冲突。Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.